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Abstract.  Large-scale  systems  are  part  of  a  growing  trend  in  distributed  computing,  and  coordinating  control  of  them  is  an 
increasing  challenge.  This  paper  presents  a  cooperative  agent  system  that  scales  to  one  million  or  more  nodes  in  which  agents 
form  coalitions  to  complete  global  task  objectives.  This  approach  uses  the  large-scale  Command  and  Control  (C2)  capabilities 
of  the  Resource  Clustered  Chord  (RC-Chord)  Hierarchical  Peer-to-Peer  (HP2P)  design.  Tasks  are  submitted  that  require  access 
to  processing,  data,  or  hardware  resources,  and  a  distributed  agent  search  is  performed  to  recruit  agents  to  satisfy  the  distributed 
task.  This  approach  differs  from  others  by  incorporating  design  elements  to  accommodate  large-scale  systems  into  the  resource 
location  algorithm.  Peersim  simulations  demonstrate  that  the  distributed  coalition  formation  algorithm  is  as  effective  as  an 
omnipotent  central  algorithm  in  a  one  million  agent  system. 
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1.  Introduction 

Deployed  Peer-to-Peer  (P2P)  systems  are  now  com¬ 
monly  eclipsing  one  million  simultaneous  nodes  [23]. 
Significant  research  efforts  continue  to  optimize  sys¬ 
tem  redundancy  and  speed  of  querying  the  data  at  these 
scales  [5],  As  enterprises  collect  more  and  more  data, 
the  use  of  datamining  to  identify  trends  becomes  more 
attractive  [35],  Because  the  P2P  agents  store  the  data, 
they  can  be  leveraged  to  also  distribute  the  datamin¬ 
ing  computation.  However,  at  one  million  agents,  such 
a  large-scale  system  requires  a  new  method  of  orga¬ 
nizing  the  agents  and  algorithms  into  task  agent  coali¬ 
tions.  Both  resources  (computation  and  data  present  at 
the  agent)  must  be  included  in  the  tasking  process.  For 
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scalability,  the  agents  operating  under  these  conditions 
must  be  flexible,  cooperative  and  multi-taskable. 

This  paper  addresses  the  problem  of  cooperative 
task  Command  and  Control  (C2)  of  a  large-scale  Dis¬ 
tributed  Multi-Agent  System  (DMAS).  The  system  is 
composed  of  cooperative  multi-taskable  agents.  Coop¬ 
erative  agents  seek  to  maximize  global  utility,  rather 
than  personal  gains  (i.e.,  not  self-interested  or  greedy). 
This  requirement  provides  honesty,  and  enforces  the 
property  that  any  bids  or  statements  of  available  re¬ 
sources  by  an  agent  toward  a  coalition  proposal  in¬ 
clude  all  available  resources  the  agent  provides.  This 
makes  the  process  scalable,  as  the  models  governing 
negotiations  do  not  include  competitive  bartering  and 
bidding  for  tasks. 

The  primary  contribution  of  this  paper  is  the  Dis¬ 
tributed  Likelihood  of  Execution  (DLoE)  algorithm. 
The  DLOE  algorithm  uses  a  coalition  formation  task 
scheduling  model  to  maximize  the  work  throughput 
in  the  system.  The  DLoE  algorithm  assigns  tasks  to 
agents  based  on  the  task’s  expected  Likelihood  of  Exe¬ 
cution  (LoE)  at  the  particular  agents.  The  LoE  is  com¬ 
puted  from  an  agent’s  scheduling  data,  and  represents 
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potential  resource  contentions.  The  scheduling  data 
uses  a  small  amount  of  overhead  at  each  agent,  and  is 
updated  periodically  to  create  a  general  view  of  agent 
scheduling  data  for  subgraphs  of  the  Hierarchical  Peer- 
to-Peer  (HP2P)  topology. 

The  HP2P  overlay  leveraged  in  this  paper  is  the  Re¬ 
source  Clustered  Chord  (RC-Chord)  HP2P  resource 
management  overlay  [17].  RC-Chord  provides  a  robust 
network  organization  framework  with  two  hierarchies 
for  organizing  and  locating  agents  by  both  address  and 
available  resources.  The  RC-Chord  overlay  is  extended 
to  maintain  the  scheduling  data  for  the  clusters  used  by 
the  DLoE  algorithm.  Because  the  DLoE  algorithm  is 
built  on  RC-Chord,  it  inherits  the  redundancy  and  the 
robustness  to  peer  churn  of  the  RC-Chord  overlay. 

Simulations  exercise  the  DLoE  coalition  formation 
algorithm  on  systems  of  one  million  agents.  Results 
are  compared  against  an  omnipotent  fully  centralized 
optimal  algorithm  and  a  greedy  algorithm  [2],  The 
greedy  algorithm  performs  worst,  forming  task  coali¬ 
tions  that  execute  tasks  up  to  25%  slower  than  the  other 
algorithms.  The  DLoE  algorithm  consistently  outper¬ 
forms  the  greedy  algorithm,  and  yields  overall  perfor¬ 
mance  to  within  one  standard  deviation  of  the  central¬ 
ized  optimal  algorithm’s  results.  The  results  of  DLoE 
testing  are  encouraging,  and  lay  a  framework  for  future 
use  in  large-scale  DMAS  application  suites. 

The  following  section  discusses  the  coalition  forma¬ 
tion  problem  definition  of  Abdallah  and  Lessher  [1]. 
Section  3  presents  related  work  on  coalition  forma¬ 
tion,  multi-robot  task  allocation,  and  resource  coordi¬ 
nation  peer- to  peer  networks.  The  DLoE  algorithm  is 
presented  in  Section  4,  leading  into  experimental  setup 
in  Section  5,  results  in  Section  6,  and  conclusions  and 
recommendations  in  Section  7. 


2.  Coalition  Formation 

Coalition  formation  focuses  on  the  construction  of 
teams  of  agents  to  execute  tasks,  with  the  goal  of  em¬ 
ploying  the  capabilities  and  assets  of  under-utilized 
agents  to  achieve  larger  and  more  sophisticated  tasks. 
A  task  is  defined  as  a  function,  with  a  desired  end  state, 
that  requires  one  or  more  agents  and  resources  to  com¬ 
plete. 

Forming  optimal  coalitions  requires  input  from  each 
agent  in  the  system,  and  is  an  A/’T-’-complete  prob¬ 
lem  [32],  As  defined  by  Abdallah  and  Lesser  [1],  con¬ 
sider  the  set  of  tasks  T  =  ( T\,T2 ,  ...,Tq).  Each  task 
Ti  is  defined  as  Ti  =  (ui,rru,  ...,rrjm),  where  iq  is 


the  utility  gained  for  accomplishing  task  7’,  and  rr,/r 
is  the  amount  of  resource  k  required  by  task  Ti.  The 
set  of  agents  is  I  =  {/i,  I2 ,  ...,  7n},  where  each  agent 
Ii  =  (cr,i, cr,;2,  ...,crjm),  and  cr^  is  the  amount  of 
resource  k  possessed  by  agent  i. 

The  coalition  formation  problem  is  defined  as  the 
allocation  of  the  subset  of  tasks  S  C  T  to  agents  that 
maximizes  the  global  utility,  U , 

u=  U  i  ■  a) 

i\TieS 

Task  allocation  algorithms  build  a  set  of  coalitions 
C  =  {Ci, ...,  C|g| },  where  C,  £  I  is  the  coalition  as¬ 
signed  to  task  Ti ,  such  that  each  task  coalition  provides 
enough  resources  of  each  type  to  satisfy  that  task’s  re¬ 
quirements. 

VT,;  e  S,Vk  :  ^2  crj,k  >  rritk ■  (2) 

ijeCi 

A  constraint  on  the  problem  is  that  each  agent  is  ca¬ 
pable  of  only  executing  a  single  task: 

Y//./:C,p|r,  -  (3) 

This  form  of  the  coalition  formation  problem  as¬ 
sumes  single-task  agents  [11],  “all  or  none"  resource 
allocation,  and  exponential  time  coalition  formation 
due  to  task  group  enumeration  [27].  These  properties 
are  modified  to  provide  the  ability  to  scale  the  tasking 
of  a  cooperative  coalition  on  an  HP2P  network. 

3.  Related  Work 

This  section  summarizes  existing  approaches  to 
solving  the  coalition  formation  problem.  It  begins  with 
traditional  solutions  and  moves  into  related  forms.  To¬ 
ward  the  solution  developed  here,  this  section  also  in¬ 
troduces  P2P  overlays  and  the  RC-Chord  HP2P  struc¬ 
tured  overlay. 

3.1.  Coalition  Formation 

Shehory  and  Kraus  [30]  describe  two  methods 
for  coalition  formation  using  reward  incentives.  In  a 
negotiation-based  formation,  all  single  agent  coalitions 
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begin  by  interacting  with  other  agents  to  determine  if 
forming  a  joint  coalition  can  yield  a  higher  payout  than 
remaining  alone.  In  the  case  that  the  two  agents  both 
determine  their  profit  can  be  increased  by  forming  a 
coalition  with  each  other,  they  negotiate  a  sharing  of 
the  additional  payout  yielded  by  forming  the  coalition. 
The  agents  negotiate  a  fair  split  of  the  profits  based  on 
greedy  [9]  or  other  semantics,  and  the  payout  may  be 
different  for  the  two  agents.  In  the  negotiation  algo¬ 
rithm,  this  process  occurs  between  all  pairs  of  agents, 
and  each  agent  attempts  to  form  a  coalition  with  its 
most  profitable  partner. 

The  second  algorithm  builds  upon  the  Shapley  for¬ 
mula  [37],  This  is  a  centralized  algorithm  in  which  a 
single  agent  collects  information  about  the  resources 
and  other  relevant  information  from  all  other  agents 
in  the  system.  The  agent  then  calculates  the  Shapley 
value,  which  involves  finding  the  payout  values  of  all 
2"  pairs  of  agents.  These  payouts  are  organized  into 
a  prioritized  data  structure,  and  all  agents  are  then  in¬ 
formed  of  the  new  coalition  schedules.  This  central¬ 
ized  algorithm  requires  0(n )  communications  (it  con¬ 
tacts  each  agent  twice)  and  0(2n)  computations. 

The  Contract  Net  Protocol  (CNP)  [33]  is  a  con¬ 
tract  system  to  allocate  tasks,  or  portions  of  tasks,  to 
one  or  more  agents.  Given  a  system  of  agents,  any 
agent  with  a  surplus  of  work  to  perform  may  start  an 
inverted  blind  auction  (contract  proposal)  for  which 
other  agents  with  a  surplus  of  resources  can  bid.  The 
bidder  with  the  most  attractive  offer  (lowest  payout) 
is  awarded  the  contract.  Agents  form  networks  of  auc¬ 
tions,  and  may  join  and  part  them  at  will.  This  con¬ 
cept  can  be  applied  in  a  HP2P  structure,  where  agents 
are  naturally  organized  into  clusters.  This  method  is 
extended  to  build  upon  more  modern  communications 
facilities,  such  as  ordered  delivery  of  TCP  and  higher 
assumed  bandwidth,  easing  constraints  in  the  original 
protocol  [29],  The  CNP  is  useful  in  both  heteroge¬ 
neous  and  homogeneous  systems  in  which  agents  do 
not  have  full  information  about  other  agents.  Rather, 
the  agents  submit  themselves  as  candidates  for  pro¬ 
cessing  a  certain  task,  based  on  availability  and  capa¬ 
bilities,  without  revealing  their  full  state  information. 

The  coalition  formation  problem  can  also  be  con¬ 
sidered  a  variant  of  the  task  allocation  problem.  The 
Multi-Robot  Task  Allocation  (MRTA)  problem  [10] 
is  given  m  robots,  each  capable  of  executing  one  or 
more  tasks,  and  n  weighted  tasks,  each  requiring  one 
or  more  robots,  the  goal  is  to  assign  robots  to  tasks 
to  maximize  the  overall  expected  performance,  taking 
into  account  the  priorities  of  the  tasks  and  the  effi¬ 


ciency  ratings  of  the  robots  [3].  The  MRTA  problem  is 
TV^-hard  [9,11]. 

This  research  effort  examines  instantiations  of  the 
multi-robot  multi-task  environment,  in  which  agents 
are  capable  of  performing  tasks  requiring  either  one  or 
more  agents,  and  with  each  agent  capable  of  perform¬ 
ing  one  or  more  simultaneous  tasks.  Tasks  will  be  in¬ 
troduced  at  runtime  (online  assignment),  and  the  form 
and  goals  of  those  tasks  are  not  known  ahead  of  time. 
Application  of  this  paradigm  to  multi-agent  systems  is 
not  yet  fully  understood,  and  applying  it  to  large-scale 
multi-agent  systems  remains  an  open  problem. 

These  concepts  are  leveraged  and  extended  in  the 
design  of  the  DLoE  algorithm.  This  algorithm  uses  the 
properties  of  an  HP2P  overlay,  combined  with  sim¬ 
plifying  assumptions  and  a  heuristic  to  support  coali¬ 
tion  formation  algorithms  in  a  large-scale  system.  The 
DLoE  algorithm  is  covered  in  more  detail  in  Sec¬ 
tion  4.2. 

3.2.  Large  Scale  P2P  Overlays 

A  communications  overlay  is  the  set  of  protocols 
and  algorithms  necessary  to  build  and  maintain  a 
topology  of  nodes  in  such  a  way  as  to  guarantee  a 
set  of  performance  parameters.  In  the  context  of  P2P 
technologies,  overlay  structures  describe  the  forma¬ 
tion  of  nodes  into  a  system  of  peers  capable  of  iden¬ 
tifying  and  locating  remote  nodes  without  foreknowl¬ 
edge  of  their  exact  location  or  being  certain  if  the  re¬ 
quested  targets  exist.  Such  a  consideration  is  neces¬ 
sary  in  many  environments  where  the  scale  of  those 
systems  is  large  enough  to  prevent  global  knowledge. 
First  generation  systems  solved  this  problem  by  query 
broadcast  [8,6],  but  this  solution  fails  to  scale.  Newer 
systems  have  developed  more  advanced  techniques  for 
locating  remote  nodes,  and  the  utility  of  such  systems 
has  brought  about  the  emergence  of  mainstream  P2P 
applications  [28,2 1,34,4, 14, 1 3, 15] . 

ML-Chord  [22]  applies  these  technologies  to  create 
a  system  organized  by  available  resources.  ML-Chord 
is  a  two-layer  HP2P  overlay  network,  where  the  top  (or 
bridge)  layer  joins  super  peers  from  each  of  the  clus¬ 
ters  at  the  second  layer.  These  Category  Layer  clusters 
each  represent  a  single  resource,  and  agents  join  one 
or  more  clusters  based  on  their  available  resource(s). 
Chord  is  used  as  the  base  agent  location  protocol,  and 
is  suitably  modified  to  accommodate  search  by  re¬ 
source  category.  Two  notable  disadvantages  of  ML- 
Chord  are  its  fixed  size  (two  layers),  and  limited  scala¬ 
bility  for  large-scale  systems.  RC-Chord  extends  ML- 
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Chord  to  address  these  limitations,  and  is  the  frame¬ 
work  for  the  DLoE  coalition  formation  algorithm. 

3.3.  RC-Chord 

RC-Chord  [17]  is  an  HP2P  overlay  network  based 
on  the  Chord  protocol  [34],  It  incorporates  the  ability 
to  scale  to  many  levels,  with  each  level  composed  of 
one  or  more  clusters.  Each  cluster  is  a  stand-alone  in¬ 
stance  of  Chord,  and  connects  to  a  cluster  in  the  next 
higher  level  of  the  hierarchy  through  a  set  of  super 
peers.  A  cluster  may  have  zero  or  more  sub-clusters  at¬ 
tached  to  it,  forming  a  tree  from  a  single  super  cluster 
root. 

RC-Chord  associates  each  agent  with  one  or  more 
resources,  and  each  cluster,  with  the  exception  of  the 
super  cluster,  represents  a  single  resource.  Agents  con¬ 
nected  to  a  particular  cluster  all  have  that  cluster’s  re¬ 
source  in  common,  and  agents  join  a  cluster  for  each 
resource  they  possess.  The  super  cluster  includes  mul¬ 
tiple  agents  from  each  resource  sub-graph  to  form  the 
root  of  each  resource  hierarchy.  RC-Chord  supports 
searching  for  agents  by  global  identifier  or  resource. 
The  hierarchy  grows  and  shrinks  dynamically  to  ac¬ 
commodate  network  churn  and  abundant  resources.  An 
abundant  resource  is  a  resource  that  many  or  all  agents 
in  a  system  may  possess,  such  as  processor  time. 

Each  cluster  consists  of  a  set  of  super  peers,  in  ad¬ 
dition  to  the  larger  percentage  of  normal  peers.  Clus¬ 
ter  super  peers  are  responsible  for  message  routing  and 
maintaining  a  shared  database  of  resource  availability 
for  their  leaf  node  agents.  All  requests  and  obligations 
of  resources  are  processed  by  cluster  super  peers.  Re¬ 
source  requests  implicitly  carry  an  intent  to  obligate, 
thus  avoiding  the  need  for  multi-stage  transactions.  In¬ 
stead,  each  request  is  examined  by  a  super  peer  to  en¬ 
sure  available  resources,  and  an  obligation  request  is 
sent  to  leaf  peers  for  a  quantity  of  the  resource.  Leaf 
peers  respond  with  either  accept  or  deny,  and  the  re¬ 
source  is  considered  obligated.  Once  the  super  peer 
has  collected  the  necessary  quantity  of  the  resource, 
the  shared  database  is  updated,  and  replies  are  sent  to 
those  parties  involved.  In  the  case  of  the  requestor,  the 
details  of  the  request  are  returned  (which  agents,  and 
how  much  of  each  resource  at  those  agents),  and  the 
leaf  peers  receive  a  session  identifier  (including  obli¬ 
gation  duration,  requestor,  etc.).  This  is  a  simplifica¬ 
tion  of  the  process,  as  error  checking  and  contention 
issues  are  omitted  for  brevity. 

Given  the  RC-Chord  system  parameters,  each  agent 
knows  roughly  how  many  clusters  exist  in  the  levels 


below  it.  In  addition,  super  peers  receive  periodic  up¬ 
dates  about  the  average  quantities  for  all  agents  con¬ 
taining  specific  resources.  This  value  is  used  to  calcu¬ 
late  the  average  LoE  for  the  new  task  for  the  local  clus¬ 
ter  and  those  clusters  in  lower  levels. 

Figure  1  shows  an  example  RC-Chord  instance  with 
seven  clusters.  Three  resources  are  present,  with  all 
three  represented  in  the  super  cluster.  When  a  re¬ 
source’s  agent  population  high-threshold  is  exceeded 
at  the  super  cluster,  a  new  cluster  for  that  resource  is 
created  at  the  second  level.  Once  a  cluster  at  the  second 
level  is  filled,  agents  joining  the  system  with  that  re¬ 
source  are  attached  to  a  new  cluster  at  the  next  level  of 
the  hierarchy.  Figure  1  shows  a  single  level-two  cluster 
for  each  of  the  system’s  three  resources.  Agents  pos¬ 
sessing  resource  three  have  continued  to  join  the  sys¬ 
tem,  and  new  clusters  for  that  resource  were  created 
at  level  three.  This  process  repeats,  with  sub-graphs  of 
each  resource  growing  outward  from  the  super  cluster 
to  accommodate  new  agents  joining  the  system. 


Fig.  1 .  An  example  RC-Chord  instance  with  three  levels.  The  super 
cluster  exists  as  the  sole  level-one  cluster.  Its  agents  serve  as  super 
peers  for  level-two  clusters,  with  a  single  level-two  cluster  for  each 
resource.  Each  resource  subgraph  may  extend  downward  into  addi¬ 
tional  levels  as  necessary,  with  a  branching  factor  proportional  to  the 
ratio  of  peers  to  super  peers  and  the  size  of  each  cluster. 


4.  Methodology 

This  section  introduces  extensions  to  the  coalition 
formation  problem  to  form  the  cooperative  coalition 
formation  problem.  The  additions  redefine  agents  to  be 
multi-taskable,  allow  agents  to  share  full  or  partial  re¬ 
sources  between  tasks,  and  allow  tasks  to  split  alloca¬ 
tions  of  a  resource  between  multiple  agents. 
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The  DLoE  algorithm,  which  is  used  here  to  solve 
the  cooperative  coalition  formation  problem  in  a  large- 
scale  DMAS,  is  also  presented.  The  algorithm  is  built 
upon  the  RC-Chord  structured  HP2P  overlay  and  bor¬ 
rows  concepts  from  contract  protocols.  A  key  distinc¬ 
tion  between  existing  contract  protocols  and  the  DLoE 
algorithm  is  that  the  DLoE  algorithm  does  not  use  ne¬ 
gotiations.  Instead,  the  algorithm  uses  knowledge  of 
agent  workloads  and  assigns  tasks  to  agents  to  maxi¬ 
mize  work  throughput  and  minimize  task  duration. 

4.1.  Peer-to-Peer  Task  Model 

The  P2P  task  model  is  designed  to  accommodate  the 
diversity  of  the  tasks  expected  to  be  executed  within 
a  large-scale  multi-agent  system.  The  modifications  to 
Abdallah  and  Lesser’s  model  are  the  redefinition  of 
utility  as  work,  the  introduction  of  task  priority,  and  ad¬ 
dition  of  task  synchrony  to  model  practical  distributed 
algorithms.  These  changes  reflect  the  scale  of  the  new 
environment  in  which  they  operate,  wherein  global 
knowledge  and  synchronization  are  likely  no  longer 
achievable  [18,39]. 

Without  the  possibility  of  global  synchronization 
and  due  to  the  growth  rate  of  the  classical  coali¬ 
tion  formation  problem,  this  task  model  introduces 
the  idea  of  work  to  motivate  the  decision  processes 
of  agents.  Rather  than  spending  long  periods  of  time 
and  bandwidth  during  the  coalition  formation  negoti¬ 
ation  process,  agents  instead  focus  on  forming  coali¬ 
tions  quickly  and  leveraging  the  scale  of  the  system 
to  achieve  maximum  useful  work  throughput.  To  this 
end,  the  idea  of  utility  is  replaced  by  a  more  tangible 
unit  called  work.  Each  task,  1),  specifies  an  amount  of 
work,  Wi,  that  must  be  performed  to  complete  the  task. 

Tasks  are  de fined  as  X]  =  (u^ ,Pi,Si,rrn, ...,  rrim ) , 
where  is  the  number  of  units  of  work  necessary  to 
complete  task  7’,,  p,  is  the  task  priority,  and  s,  is  the 
task  synchrony. 

Each  agent  is  capable  of  executing  one  unit  of 
work  per  time  unit.  Since  agents  are  multi-taskable, 
they  may  be  members  of  multiple  coalitions  and  must 
choose  which  task  to  execute  at  each  time  step.  The 
task  priority,  provides  a  mechanism  for  runtime 
tuning,  as  well  as  scheduling  fidelity.  Task  priorities 
for  this  model  fit  within  a  range  of  [  1,10],  with  10  be¬ 
ing  the  highest  priority.  At  each  time  step,  agents  with 
multiple  tasks  use  the  priority  and  a  decision  process 
(Section  4.1.1)  to  choose  which  task  to  execute.  Max¬ 
imum  work  throughput  is  achieved  when  each  agent 
has  a  task  to  execute  at  each  step,  and  establishing  a 


local  scheduling  policy  based  on  task  priority  ensures 
that  higher  priority  tasks  are  completed  first. 

Not  all  tasks  are  completely  parallel  [20],  and  may 
require  periodic  barrier  synchronization  points.  These 
barrier  synchronization  points  halt  processing  on  all 
agents  that  have  reached  the  barrier  until  all  other 
agents  arrive.  This  generalized  mechanic  is  used  to 
represent  scenarios  in  which  substantial  variation  ex¬ 
ists  between  the  processing  capabilities  of  individual 
heterogeneous  agents,  resource  contention  arises,  or 
to  accommodate  tasks  that  require  frequent  updates  or 
synchronization  between  execution  threads. 

Each  task  is  assigned  a  task  synchrony  value.  Si,  that 
specifies  the  number  of  steps  each  agent  can  perform 
before  reaching  a  barrier.  Upon  reaching  a  task  syn¬ 
chronization  barrier,  an  agent  halts  processing  on  that 
task  until  all  other  agents  assigned  to  the  task  reach 
that  barrier.  During  that  time,  the  agent  at  the  barrier 
removes  the  task  from  its  ready  queue,  and  instead 
executes  work  for  other  task  coalitions  of  which  it  is 
a  member.  Once  the  synchronization  barrier  has  been 
met  by  all  other  agents,  the  task  becomes  ready  to  ex¬ 
ecute  by  all  agents  in  the  task  coalition. 

To  accommodate  the  allocation  of  an  agent’s  re¬ 
sources  to  multiple  coalitions,  Abdallah  and  Lesser’s 
model  is  extended  to  include  cr*fc  as  the  portion  of 
agent  j’s  supply  of  resource  k  that  is  allocated  to  coali¬ 
tion  i,  which  is  not  to  exceed  the  agent’s  total  supply 
of  resource  fc,  defined  by  cry/.  (Equation  4).  This  al¬ 
lows  each  agent  the  flexibility  to  participate  in  multi¬ 
ple  coalitions  simultaneously.  It  also  removes  the  “all 
or  none"  property,  thus  increasing  the  satisfiability  of 
agent  coalition  formation  by  allowing  partial  amounts 
of  resources,  and  participation  in  multiple  coalitions. 

V/j  €  I,Vk  :  ]T  cr)k  <  \crjk\.  (4) 

Ties 

To  maximize  performance,  each  agent  can  con¬ 
tribute  only  one  resource  (or  partial  resource)  to  a  task. 
This  constraint  prohibits  the  same  agent  from  joining 
a  task  more  than  once.  It  eliminates  the  possibility  of 
contention  in  task  scheduling  and  ensures  minimum 
time  between  barriers. 

The  cooperative  coalition  formation  problem  is  de¬ 
fined  as  the  allocation  of  the  subset  of  tasks  S  C  T  to 
agents  that  maximizes: 

w{t)  =  YJMt)-  (5) 

ieu 
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Note  that  the  global  work  throughput,  W,  is  not  re¬ 
solved  with  the  formation  of  task  coalitions.  Rather,  W 
is  time-dependent,  and  increases  with  the  number  of 
agents  that  are  able  to  execute  a  unit  of  work  per  time. 
Coalition  formation  algorithms  must  therefore  focus 
on  the  allocation  of  tasks  to  agents  such  that  the  num¬ 
ber  of  agents  per  task  is  globally  balanced. 

4.1.1.  Scheduling 

Each  agent  is  capable  of  receiving  and  processing 
multiple  tasks,  and  may  only  execute  a  single  unit 
of  work  per  unit  time.  Agents  choose  which  task  to 
execute  from  those  tasks  they  possess  using  a  prior¬ 
ity  based  scheduling  algorithm.  Scheduling  follows  a 
sampling  process  in  which  each  task  is  weighted  based 
on  its  priority.  Under  this  roulette  wheel  sampling, 
higher  priority  tasks  have  a  higher  likelihood  of  be¬ 
ing  selected  to  receive  processor  time.  This  scheduling 
algorithm  ensures  fairness  and  progress,  while  decon¬ 
flicting  between  two  optimization  parameters:  number 
of  tasks  on  the  agent  and  task  priorities. 

4.2.  Distributed  LoE 

The  main  contribution  of  this  paper  is  the  DLoE 
algorithm.  The  DLoE  algorithm  attempts  to  achieve 
maximum  work  throughput  by  forming  coalitions  for 
tasks.  Since  the  algorithm  is  distributed  and  designed 
to  operate  in  large-scale  systems,  it  does  not  use  global 
knowledge. 

Contrary  to  most  coalition  formation  algorithms,  the 
agents  in  the  DLoE  have  no  decision  authority  about 
which  task  coalitions  they  join.  Rather,  the  distributed 
algorithm  decides  the  task  allocation  strategy  that  best 
benefits  the  system  work  throughput.  The  DLoE  algo¬ 
rithm  borrows  from  the  military  command  paradigm: 
centralized  authority,  decentralized  execution.  The  ex¬ 
ecution  of  the  algorithm  itself  is  distributed,  however 
the  authority  it  carries  is  centralized  in  the  sense  that 
agents  are  less  autonomous  than  other  approaches. 
The  objective  of  this  paradigm  is  to  minimize  negotia¬ 
tions  resulting  in  coalition  formation,  thus  reducing  the 
overhead  of  the  algorithm,  and  yielding  higher  system 
work  throughput. 

The  DLoE  algorithm  builds  on  the  RC-Chord  HP2P 
structure.  At  the  super  peer  level,  in  addition  to  track¬ 
ing  resource  amounts,  the  super  peers  track  the  total 
priority  points  (TPP)  of  the  nodes  in  its  cluster  and  the 
average  TPP  for  connected  clusters.  The  total  priority 
points  are  the  sum  of  the  priority  values  of  all  of  the 
tasks  at  an  agent. 


Tasks  are  introduced  to  the  system  at  any  agent. 
When  a  task  enters  the  system,  the  task  is  sent  to  a  clus¬ 
ter  super  peer.  If  the  super  peer  needs  resources  that  are 
not  available  in  its  clusters  or  clusters  beneath  it,  it  for¬ 
wards  the  task  to  the  super  cluster.  A  super  cluster  su¬ 
per  peer  then  locates  agents  that  can  satisfy  the  task’s 
requirements.  The  super  cluster  can  be  reached  by  any 
agent  in  the  system  in  O (logm(N))  steps,  where  m  is 
the  Chord  address  width  per  cluster.  The  DLoE  algo¬ 
rithm  recursively  searches  down  the  most  likely  sub¬ 
graphs  to  locate  agents  capable  of  satisfying  resource 
requirements  for  the  new  task. 

Algorithm  1  shows  the  execution  of  the  DLoE 
search  algorithm  to  locate  the  best  agent  to  the  system 
to  which  to  assign  a  new  task.  The  distributed  recur¬ 
sive  algorithm  executes  on  a  super  peer  in  a  cluster.  It 
begins  by  initializing  the  TPP  to  be  the  average  TPP 
of  all  agents  in  the  cluster,  and  examines  the  base  con¬ 
dition  to  check  if  the  current  cluster  contains  the  best 
agent  for  the  new  task.  The  DLoE  coalition  formation 
algorithm  attempts  to  maximize  the  likelihood  that  the 
new  task  receives  processor  time  on  each  agent.  The 
algorithm  assigns  points  to  each  task,  1),  resident  on  a 
target  agent,  Ij,  based  on  the  priorities  of  those  tasks: 

priorityjpoints  =  priority(Ti)  (6) 

Ti€lj 

The  value  of  each  agent’s  priority jpoints  is  com¬ 
pared,  and  the  agent  with  the  lowest  value  is  assigned 
the  new  task.  This  ensures  minimum  competition  for 
processing  time  for  the  new  task  on  the  target  agent, 
thus  maximizing  its  likelihood  to  execute,  and  reduc¬ 
ing  the  task’s  total  execution  duration. 

The  algorithm  executes  on  a  super  peer,  and  there¬ 
fore  the  agent  has  access  to  the  aggregate  TPP  in¬ 
formation  for  all  agents  in  its  cluster.  The  algorithm 
examines  the  LoE  for  the  task  against  all  agents  in 
the  current  cluster.  This  value  is  computed  at  a  super 
peer  without  further  inter-agent  communications,  and 
is  compared  to  the  LoE  for  the  sum  of  all  sub-clusters 
from  the  current  cluster.  A  lower  LoE  value  is  more 
desirable,  as  lower  values  indicate  less  competition  in 
the  task  scheduling  algorithm.  Upon  finding  the  clus¬ 
ter  with  the  lowest  LoE,  the  algorithm  sends  the  task 
to  a  super  peer  in  the  cluster.  The  algorithm  then  re¬ 
cursively  descends  the  hierarchy  until  it  encounters  a 
leaf  cluster,  or  it  finds  the  agent  in  the  cluster  with  the 
lowest  LoE. 
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Algorithm  1  chooseNodeRecursi ve(cluster,  Vj)  picture  of  the  TPP  for  each  sub-graph.  When  searching 

clusterTPP  =  cluster. localTPP / cluster.- numN odes  for  agents  to  satisfy  resource  requirements  for  coali- 


subNodes  =  cluster. subN  odes(ri) 
subTPP  =  MAXJNT 
si  subNumNodes  >  0.0  entonces 
subTPP  =  cluster.subTPP(ri) 

fin  si 

si  clusterTPP  <  subTPP  entonces 

best  Agent  =  choose  AgentLocal{cluster,ri) 

fin  si 

cluster\\subC  lusters  =  getSubC  luster  s() 
subBestTPPSoFar  =  max(int) 
clustersubBestSoFar  =  nil 
para  i  =  0  :  subClusters  .length  hacer 
subTest  =  subC  luster  s[i\ 
subTPP  =  subT  est  .totalT  P  P  (ri) 
subNodes  =  subTest.  node  s(ri) 
subLoE  =  subTPP /subNodes 
si  subLoE  <  subBestTPPSoFar  entonces 
subBestTPPSoFar  =  subLoE 
subBestSoFar  =  subTest 
fin  si 
fin  para 

devolver  chooseNodeRecursi ve(subBestSoFar,  r,) 


Implicit  in  the  decision  making  of  the  DLoE  algo¬ 
rithm  is  that  all  holders  of  a  resource  are  considered 
equal  in  quality,  although  perhaps  not  quantity.  Like¬ 
wise,  agents  contribute  equal  work  units  to  each  task 
they  host.  Since  resources  can  be  shared  between  tasks 
and  agents  execute  only  a  single  task  per  time  step,  an 
agent’s  resources  are  multiplexed  to  its  task  coalitions 
in  the  same  way  that  execution  time  is  shared  between 
tasks.  These  assumptions  establish  a  balance  among 
agents,  and  reduce  the  search  space  of  the  DLoE  algo¬ 
rithm. 

Each  node  in  the  system  periodically  passes  its  re¬ 
source  amounts  and  TPP  to  one  of  its  cluster’s  super 
peers.  This  information  is  aggregated  and  the  TPP  is 
tracked  according  to  resource  amount.  For  example,  in 
a  system  that  permits  resource  amounts  [0,1000],  the 
resource  interval  is  split  n  times.  For  each  of  these  n 
equally  divided  resource  intervals,  the  count  of  TPP  for 
the  cluster  is  tracked.  Along  with  the  number  of  nodes 
in  the  cluster,  again  divided  by  resource  interval,  this 
vector  of  TPP  per  resource  interval  is  all  that  is  passed 
upward  between  clusters.  This  information  continues 
an  upward  ascent  through  the  levels  of  the  HP2P  net¬ 
work  until  it  reaches  the  super  cluster.  The  DLoE  al¬ 
gorithm  uses  this  information  to  build  an  approximate 


tion  formation,  the  DLoE  heuristic  chooses  the  route 
that  minimizes  the  expected  scheduling  contention  by 
examining  the  aggregate  TPP  at  each  cluster.  Finding 
the  node  with  minimum  TPP  will  yield  the  highest 
likelihood  of  execution  for  the  new  task  and  minimize 
its  expected  task  duration. 

The  data  collection  and  dissemination  accounts  for  a 
small  amount  of  overhead.  Given  10  resource  intervals 
on  a  64-bit  architecture  and  two  data  structures  (TPP 
and  resource  availability),  the  information  passed  from 
a  cluster  to  its  next  higher  level  cluster  consumes  ap¬ 
proximately  160  bytes.  A  system  of  one  million  agents, 
with  1000  agents  per  cluster,  will  have  roughly  1000 
clusters.  This  entire  periodic  maintenance  process  con¬ 
sumes  approximately  160KB  per  update  period  for  a 
large-scale  system.  Cluster  super  peers  store  the  TPP 
data  for  members  of  their  cluster,  updating  as  new  in¬ 
formation  becomes  available.  This  results  in  a  maxi¬ 
mum  of  numjintervals  *  2m  entries  stored  per  super 
peer,  or  82KB  of  memory  on  a  64-bit  machine  with  10 
resource  intervals  and  a  maximum  of  1024  agents  per 
cluster. 


5.  Experimental  Setup 

To  demonstrate  the  effectiveness  of  the  DLoE  al¬ 
gorithm  for  large-scale  DMAS  coalition  formation,  an 
RC-Chord  implementation  is  created  to  operate  within 
the  Peersim  [16]  P2P  simulator.  The  coalition  for¬ 
mation  algorithms  in  the  simulations  are  built  upon 
the  RC-Chord  overlay,  and  experiments  evaluate  sys¬ 
tems  of  one  million  simulated  nodes.  The  RC-Chord 
HP2P  structure  consists  of  up  to  4096  nodes  per  clus¬ 
ter  (m  =  12)  and  one  super  peer  for  each  512  peers  to 
handle  network  churn,  both  tunable  parameters.  These 
numbers  were  chosen  based  on  experimental  results 
to  minimize  mean  hop  length  between  nodes,  and  by 
choosing  a  maximum  cluster  size  that  maintains  a  bal¬ 
ance  between  internal  maintenance  overhead  and  the 
overall  number  of  clusters  in  the  system[17]. 

Experiments  last  15,000  time  units  each,  which  is  a 
suitable  length  of  time  for  data  trends  to  become  sta¬ 
ble.  At  each  time  step,  tasks  are  allocated  according 
to  the  desired  load  for  the  simulation.  Tasks  complete 
when  their  required  number  of  work  units  have  been 
executed  by  its  coalition’s  agents.  Each  experiment  is 
run  30  times  to  provide  a  means  for  statistical  compar¬ 
ison. 
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During  execution  of  the  experiments,  the  system  un¬ 
dergoes  a  churn  rate  proportional  to  the  size  of  the 
system.  At  each  time  step,  approximately  1%  of  the 
agents  in  the  system,  chosen  randomly,  are  removed 
from  the  system,  and  the  same  number  of  agents  is 
reintroduced  into  the  system  at  randomly  chosen  loca¬ 
tions.  This  churn  helps  to  ensure  that  the  underlying 
network  management  protocols  are  functioning  prop¬ 
erly,  to  include  super  peer  nomination  and  promotion. 
Any  coalitions  that  are  damaged  as  a  result  of  los¬ 
ing  a  member  agent  undergo  a  repair  mechanism,  and 
one  or  more  new  agents  are  chosen  to  replace  the  lost 
agent(s)[17]. 

These  experiments  compare  three  coalition  forma¬ 
tion  algorithms: 

-  CRP:  An  agent  is  chosen  at  random.  If  that  agent 
meets  the  task’s  resource  requirements,  then  it  is 
added  to  the  task  coalition  [2], 

-  CLoE:  Tasks  are  allocated  to  agents  to  maximize 
the  probability  of  the  task  receiving  processor 
time  slots.  This  centralized  algorithm  uses  global 
knowledge. 

-  DLoE:  Similar  to  the  CLoE  algorithm,  except  that 
global  knowledge  is  removed  and  the  algorithm  is 
distributed. 

The  CRP  [2]  and  CLoE  algorithms  serve  as  baseline 
algorithms.  The  CRP  is  a  centralized  algorithm  with 
full  knowledge [  19,3 1 ,36,38,40] .  The  CRP  algorithm 
randomly  chooses  agents  to  include  in  a  task  coalition. 
However,  this  process  can  fail  as  the  targeted  agent 
may  not  have  the  proper  resource  type,  or  may  have 
an  insufficient  quantity  of  the  resource.  As  such,  the 
CRP  algorithm  consumes  additional  bandwidth  and  is 
the  only  algorithm  that  can  miss. 

The  CLoE  optimizes  a  task’s  LoE  with  global 
knowledge.  The  CLoE  algorithm  is  an  adaptation  of 
the  CNP  [25],  modified  to  operate  in  an  environment 
where  agents  are  required  to  volunteer.  The  advantage 
of  global  knowledge  for  CLoE  is  that  no  ratio  aver¬ 
aging  is  used  in  the  decision  process.  Instead,  the  al¬ 
gorithm  identifies  the  agent  in  the  system  with  lowest 
TPP  by  examining  every  agent  in  the  system  at  each 
coalition  formation  point.  With  this  global  knowledge, 
the  CLoE  algorithm  serves  as  an  optimal  baseline. 
Because  the  CLoE  operates  in  simulation,  the  global 
knowledge  is  available.  However,  in  a  real  world  sys¬ 
tem  the  message  passing  to  the  central  server  would 
cause  a  denial  of  service.  Although  the  CLoE  algo¬ 
rithm  provides  accurate  results  using  its  global  cen¬ 
tralized  search,  the  key  contribution  of  the  DLoE  al¬ 


gorithm  is  that  its  performance  is  close  to  that  of  the 
CLoE  algorithm,  but  uses  a  tractable  algorithm  that 
can  be  achieved  for  large  scale  distribute  multi-agent 
systems. 

5.1.  Factors 

Table  1  describes  the  factors  in  these  simulations. 
The  objective  of  testing  under  these  conditions  is  to 
exercise  the  critical  parts  of  the  coalition  formation  al¬ 
gorithms,  and  examine  the  resulting  effectiveness  met¬ 
rics. 

Table  1 

Simulation  Factors 


Variable 

Range 

Algorithm 

CRP.  CLoE,  DLoE 

Task  Synchrony  ( Si ) 

-1,  1,2,  3,  4,  5,  10,  15  (steps) 

Load 

500,  1000,  1500  (tasks  per  step) 

DLoE  Update  Interval 

0,  5.  10,  25,  50  (steps) 

Tasks  are  allocated  uniformly  across  each  simula¬ 
tion  time  line,  with  task  priorities  following  the  proba¬ 
bilities  shown  in  the  bi-modal  distribution  of  Figure  2. 
The  dominant  distribution  has  a  mean  of  3.5  (cr  =  1.4) 
and  represents  the  creation  of  tasks  during  normal  op¬ 
erations.  The  target  application  domain  for  this  re¬ 
search  is  that  of  network  operations,  to  include  net¬ 
work  defense.  As  a  result  the  task  priority  distribu¬ 
tion  also  includes  a  second  mode  of  higher  priority, 
and  lower  frequency,  which  represents  unpredictable 
and  emergent  tasks,  such  as  handling  a  network  attack. 
This  second  Gaussian  distribution  has  mean  8.0  (a  = 
1.5)  and  represents  real  world  situations  that  induce  a 
sudden  burst  of  task  creations.  Tasks  assigned  into  this 
category  are  centered  around  priority  eight,  and  repre¬ 
sent  a  small  percentage  of  all  tasks. 

The  primary  loading  factor  is  the  number  of  tasks 
created  per  time  step  (Tasks  Per  Step  ftps)).  The  in¬ 
coming  task  work  load,  or  work  generated,  is  measured 
in  units  of  work  per  unit  time,  with  each  task  requiring 
between  750  and  1000  units  of  total  work  to  complete. 
Given  the  optimal  work  throughput  of  the  system  at 
one  million  units  of  work  per  unit  time,  the  values  of 
500tps,  lOOOtps,  and  1500tps  represent  under-loaded, 
critically-loaded,  and  over-loaded  systems.  The  objec¬ 
tive  of  these  values  is  to  measure  the  effectiveness  of 
each  algorithm  in  these  scenarios. 

The  DLoE  level  heuristic  algorithm  has  one  ad¬ 
ditional  critical  process  variable:  the  update  interval. 
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Fig.  2.  Simulation  task  priority  distribution.  This  is  a  bi-modal  mix¬ 
ture  model  of  two  Gaussian  distributions,  one  for  standard  operating 
missions,  and  a  second  for  low  probability,  high  priority  missions. 

The  update  interval  is  the  duration,  in  simulation  time 
steps,  between  updates  of  DLoE  shared  resource  data 
(Section  4.2).  Unless  otherwise  noted,  the  level  heuris¬ 
tic  update  interval  is  kept  at  five,  and  is  determined  ex¬ 
perimentally  to  minimize  network  maintenance  band¬ 
width  usage,  and  still  provide  super  peers  with  accu¬ 
rate  approximations  of  cluster  TPP’s[17]. 

5.2.  Control  Factors 

Each  agent  is  assigned  one  of  five  resources,  with 
a  quantity  between  400  and  1000  units.  The  range  of 
resource  quantity  was  chosen  experimentally  to  ensure 
that  each  agent  could  reasonably  be  part  of  multiple 
task  coalitions.  This  introduces  sufficient  diversity  in 
the  system  to  validate  the  model  by  forcing  decision 
making  in  coalition  formation  algorithms,  generating 
multiple  resource  sub-graphs  in  the  topology  construc¬ 
tion  phase,  and  exercising  task  generation  by  varying 
the  number  of  required  resources  required  per  task. 

All  experiments  simulate  a  system  of  one  million 
nodes.  Each  Chord  cluster  is  given  12  bits  of  address 
space,  allowing  for  4096  agents  per  cluster,  with  a 
minimum  of  244  clusters  for  a  system  of  one  million 
agents. 

5.3.  Response  Variables 

The  objective  of  these  experiments  is  to  evaluate  the 
effectiveness  of  the  DLoE  heuristic  algorithm  against 
the  baseline  uniform  (CRP)  and  optimal  (CLoE)  al¬ 
gorithms.  Primary  measures  of  this  effectiveness  are 
the  agent  solicitation  miss  rate,  the  sustained  workload 
performance  of  the  system,  and  the  global  balanced 
utilization  of  resources. 


6.  Results  and  Analysis 

This  analysis  evaluates  the  effectiveness  of  the 
DLoE  algorithm  by  comparing  its  results  to  the  CRP 
and  CLoE  algorithms.  The  CRP  algorithm  serves  as  a 
baseline,  and  the  CLoE  algorithm  forms  a  work  opti¬ 
mal  algorithm. 

Table  2  shows  the  work  throughput  of  the  three  al¬ 
gorithms  on  the  critically-loaded  scenario  across  the 
different  task  synchrony  levels.  The  results  are  orga¬ 
nized  by  algorithm  type  and  task  synchrony.  Task  syn¬ 
chrony  is  the  number  of  time  steps  between  each  syn¬ 
chronization  barrier,  with  a  larger  value  indicating  that 
the  task  reaches  a  barrier  less  often.  The  CLoE  algo¬ 
rithm  serves  as  the  theoretical  best  performance  for  the 
LoE  approach,  and  DLoE  tracks  those  results  compet¬ 
itively  for  each  experiment.  The  CRP  algorithm  suf¬ 
fers  from  using  a  random  decision  maker  algorithm  to 
choose  agents  to  join  tasks,  and  even  though  it  gen¬ 
erates  a  more  uniform  distribution  of  tasks  to  agents, 
it  neglects  the  priorities  of  each  task.  This  results  in 
agents  maintaining  a  uniform  number  of  tasks,  how¬ 
ever  there  is  more  contention  in  executing  tasks  on 
those  agents,  and  the  overall  work  throughput  suffers. 

Figure  3  shows  the  work  throughput  of  the  three  al¬ 
gorithms  on  the  over-loaded  scenario  with  task  syn¬ 
chrony  disabled.  Both  the  CLoE  and  DLoE  algorithms 
approach  the  practical  maximum  of  one  million  units 
of  work  executed  per  unit  time,  which  matches  the  best 
scenario  work  creation  rate.  The  CRP  algorithm  is  un¬ 
able  to  reach  this  milestone.  This  is  one  of  the  bene¬ 
fits  of  both  the  CLoE  and  DLoE  algorithms:  they  try  to 
allocate  tasks  to  agents  that  most  minimally  meet  task 
resource  requirements.  Without  any  heuristic,  the  CRP 
algorithm  simply  accepts  the  first  agent  that  can  satisfy 
a  task’s  requirements,  regardless  of  the  excess  resource 
amounts  possessed  by  the  agent. 

Figure  4  shows  the  number  of  tasks  allocated  per 
agent  for  the  three  algorithms  in  a  critically-loaded 
system  with  synchrony  disabled.  Agents  in  the  systems 
with  the  CLoE  and  DLoE  algorithms  have  a  lower 
number  of  tasks  per  agent  as  a  result  of  the  higher 
overall  work  throughput.  Since  tasks  complete  more 
quickly,  the  agents  are  able  to  satisfy  the  incoming 
workload  more  easily,  and  thus  have  fewer  tasks.  The 
CRP  algorithm  has  a  lower  work  execution  rate,  and 
therefore  maintains  more  tasks  per  agent.  This  con¬ 
tinues  until  a  saturation  point  is  reached  wherein  the 
randomness  of  the  CRP  coalition  formation  algorithm 
eventually  provides  enough  tasks  to  agents  with  lower 
amounts  of  resources  to  meet  the  incoming  workload. 
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Table  2 

Impact  of  Task  Synchrony  on  Work  Throughput.  Shown  are  work 
throughput  mean  (standard  deviation)  for  critically-loaded  (1000 
tps)  systems  of  varying  task  synchrony.  Task  synchrony  is  the  num¬ 
ber  of  steps  between  each  barrier  synchronization  point,  with  higher 
values  meaning  the  barrier  is  met  less  often. 


Synchrony  Level 

DLoE 

CLoE 

CRP 

None 

876779.98  (2473.38) 

877000.25  (463.76) 

856961.36  (5824.41) 

1 

867308.54  (4246.03) 

877024.73  (471.87) 

631134.59  (2414.76) 

5 

866507.99  (3521.34) 

876975.48  (458.81) 

631461.40  (3475.70) 

10 

870917.79  (2724.06) 

877013.30  (457.48) 

636517.08  (4115.23) 

25 

872776.62  (1687.70) 

877019.55  (446.20) 

649992.75  (3316.01) 

50 

873027.80  (3166.34) 

876959.47  (449.23) 

678106.20  (5013.18) 
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Fig.  3.  Comparison  of  the  three  algorithms  in  an  over-loaded  sys¬ 
tem  (1500tps)  with  synchrony  disabled.  The  work  throughput  of  the 
CLoE  and  DLoE  is  comparable,  whereas  the  CRP  algorithm  per¬ 
forms  noticeably  worse  (ANOVA  p=0). 


Fig.  4.  Comparison  of  the  three  algorithms  in  a  critically-loaded  sys¬ 
tem  (lOOOtps)  with  synchrony  disabled.  The  number  of  tasks  per 
agent  for  the  CRP  algorithm  eventually  stabilizes  at  a  significantly 
higher  value  than  the  CLoE  or  DLoE  algorithms. 


Table  3  shows  the  impact  of  the  update  interval  on 
the  mean  TPP  per  RC-Chord  level.  The  results  show 
a  strong  dependence  on  the  update  interval,  revealing 
the  impact  of  rapid  task  creation  and  TPP  caching  in 
large-scale  coalition  formation.  For  systems  with  one 
million  agents,  over  977,000  reside  in  level  three.  For 
the  critically-loaded  DLoE  systems  shown  in  Table  3, 
level  one  shows  the  highest  mean  variance  against  the 
baseline  level  three  due  to  its  small  size.  Reducing  the 
frequency  of  DLoE  updates  reduces  the  accuracy  of 
TPP  balancing  because  the  coalition  formation  algo¬ 
rithm  is  forced  to  use  TPP  data  that  has  become  unrep¬ 
resentative  of  the  system’s  current  state.  Tuning  the  up¬ 
date  frequency  significantly  improves  the  coalition  for¬ 
mation  algorithm  accuracy  at  the  expense  of  increased 
bandwidth  consumption. 

Under  significant  loading,  the  task  duration  for  the 
DLoE  and  CLoE  algorithms  resembles  Figure  5  for  all 
levels  of  task  synchrony.  The  data  show  high  priority 


tasks  complete  with  shorter  mean  duration  than  lower 
priority  tasks.  The  excessive  loading  in  these  experi¬ 
ments  creates  a  higher  number  of  tasks  per  agent,  and 
therefore  the  task  scheduler  relies  primarily  on  priority 
to  choose  which  task  to  execute  at  each  time  step. 

For  under-loaded  systems,  task  durations  remain  rel¬ 
atively  uniform  across  all  task  priorities.  This  occurs 
because  agents  in  those  systems  are  able  to  meet  the  in¬ 
coming  workload,  and  so  the  scheduler  is  rarely  forced 
to  choose  which  task  to  execute  based  solely  on  prior¬ 
ity. 

A  notable  exception  to  this  trend  is  the  CRP  algo¬ 
rithm,  which  poorly  places  its  tasks  in  all  cases,  lead¬ 
ing  to  a  high  standard  deviation  for  the  number  of 
tasks  per  agent.  This  trend  is  shown  in  Figure  6.  Un¬ 
der  the  CRP  algorithm,  many  agents  are  overloaded, 
while  others  have  zero  load.  This  leads  to  bottlenecks 
at  those  higher  loaded  agents,  creating  a  net  reduc¬ 
tion  in  the  work  executed  per  unit  time,  and  forcing 


10 


D.  Karrels  et  al.  /Large-scale  cooperative  task  distribution  on  peer-to-peer  networks 


Table  3 

Mean  TPP  per  agent  at  each  level  measured  against  the  DLoE  update  interval. 


RC-Chord  Level  Number 

Update  Interval  (time  steps) 

0 

i 

5 

10 

25 

50 

Level  1 

5.15 

5.28 

10.22 

15.59 

54.87 

218.79 

Level  2 

4.46 

4.61 

7.65 

10.71 

25.13 

48.03 

Level  3 

4.22 

4.38 

6.12 

7.74 

15.92 

28.76 

Fig.  5.  Task  duration  versus  priority  for  critically-loaded  (lOOOtps) 
and  over-loaded  systems  (1500tps).  The  trends  are  identical  for  both 
the  CLoE  and  DLoE  coalition  formation  algorithms.  The  duration 
of  tasks  reduces  with  increased  priority  when  the  task  scheduler  is 
forced  to  decide  based  on  priority. 


Fig.  6.  Task  duration  versus  priority  for  a  critically-loaded  system 
(1500tps)  using  the  CRP  coalition  formation  algorithm.  The  poor 
assignment  of  agents  to  task  coalitions  yields  bottlenecks  in  systems 
with  task  synchrony,  causing  delays  in  processing  and  eliminating 
the  effectiveness  of  the  task  scheduling  algorithm. 

critically-loaded  systems  to  become  unable  to  meet  the 
incoming  task  workload. 

Figure  7  shows  the  statistical  comparison  of  the 
work  throughput  versus  task  synchrony  for  critically- 


loaded  experiments  where  synchrony  is  disabled,  and 
Figure  8  shows  the  same  scenario  with  task  synchrony 
enabled.  The  difference  in  performance  for  both  cases 
between  the  CRP  algorithm  and  the  other  two  is  sub¬ 
stantially  different,  as  the  CRP  performs  far  worse  than 
the  other  two  algorithms.  Both  the  CLoE  and  DLoE  al¬ 
gorithms  have  similar  median  work  throughput  (within 
one  standard  deviation  of  one  another),  but  the  CLoE 
algorithm  yields  lower  standard  deviation  with  fewer 
outliers  due  to  higher  coalition  formation  quality  as  a 
result  of  global  knowledge. 


Fig.  7.  Boxplot  for  work  throughput  versus  task  synchrony  in  criti¬ 
cally-loaded  experiments  where  synchrony  is  disabled.  The  median 
performance  range  between  the  DLoE  and  CLoE  algorithms  is  simi¬ 
lar,  however  the  standard  deviation  and  number  of  outliers  for  DLoE 
is  greater  than  that  of  the  CLoE  algorithm.  The  CRP  algorithm  yields 
lower  performance  than  both  other  algorithms. 

The  result  of  these  experiments  demonstrates  that 
task  synchrony  plays  an  important  part  in  the  overall 
performance  of  coalitions  generated  using  these  differ¬ 
ent  algorithms.  The  CRP  algorithm,  in  particular,  suf¬ 
fers  from  poor  performance  as  a  result  of  its  coalition 
formation  process.  In  addition,  the  CRP  algorithm  suf¬ 
fers  from  the  highest  miss  rate  of  the  algorithms  con¬ 
sidered,  reaching  over  80%  in  some  experiments. 

The  CLoE  demonstrates  the  best  overall  sustained 
task  execution  rate,  as  well  as  the  lowest  standard  de- 
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Fig.  8.  Boxplot  for  work  throughput  versus  task  synchrony  in  crit¬ 
ically-loaded  experiments  where  synchrony  is  enabled.  The  CLoE 
and  DLoE  algorithms  perform  far  better  than  the  CRP  algorithm  due 
to  coalition  formation  heuristics. 

viation.  The  DLoE  algorithm  is  statistically  similar  in 
mean  work  throughput,  with  slightly  higher  variance. 
The  DLoE  algorithm  successfully  tackles  the  chal¬ 
lenge  of  task  synchrony,  and  allow  the  task  scheduler 
to  make  good  decisions  that  reduce  the  durations  of 
tasks  with  higher  priorities  during  high  workloads. 

7.  Conclusion 

As  highly  networked  entities  seek  to  leverage  the 
power  and  scale  of  P2P  systems  and  their  data,  the  dif¬ 
ficulty  of  efficiently  sharing  the  capabilities  and  assets 
of  the  attached  systems  becomes  critically  important. 
This  paper  presents  the  Distributed  Likelihood  of  Exe¬ 
cution  algorithm,  which  uses  the  cooperative  coalition 
formation  problem  as  a  framework  for  tasking  agents. 
The  DLoE  algorithm  relies  upon  the  facilities  of  the 
RC-Chord  structured  overlay,  specifically  the  ability  to 
allocate  agents  into  clusters  organized  by  resource  or 
capability.  The  DLoE  algorithm  stores  task  informa¬ 
tion  for  agents  with  each  cluster,  and  passes  this  infor¬ 
mation  up  the  network  hierarchy  to  construct  a  general 
view  of  the  task  loading  on  each  sub-graph.  The  DLoE 
algorithm  uses  this  information  to  decide  how  far,  and 
in  which  direction,  to  descend  in  search  of  agents  to 
satisfy  task  allocation  requests.  Simulation  results  in¬ 
dicate  that  our  distributed  algorithm  performs  nearly  as 
well  as  an  omnipotent  centralized  optimal  algorithm, 
and  significantly  better  than  the  baseline  algorithm. 

Potential  improvements  of  the  DLoE  algorithm  in¬ 
clude  building  a  new  strategy  for  allocating  the  number 
and  spacing  of  the  resource  intervals  for  the  DLoE  al¬ 


gorithm.  These  intervals  are  currently  static,  and  only 
configurable  before  runtime.  This  can  be  improved  by 
incorporating  a  distributed  learning  algorithm  to  tune 
the  number  and  range  of  the  DLoE  tracking  intervals 
at  runtime.  In  addition,  work  contributed  by  different 
agents  should  ideally  contribute  to  different  pools  of 
work  within  the  task.  At  present,  units  of  work  are  con¬ 
sidered  equal,  and  although  this  satisfies  the  target  de¬ 
ployment,  this  may  not  be  the  case  in  all  applications. 
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